Quantification of population structure using correlated SNPs by shrinkage principal components.

نویسندگان

  • Fei Zou
  • Seunggeun Lee
  • Michael R Knowles
  • Fred A Wright
چکیده

BACKGROUND/AIMS Association studies using unrelated individuals have become the most popular design for mapping complex traits. One of the major challenges of association mapping is avoiding spurious association due to population stratification. Principal component analysis (PCA) on genome-wide marker genotypes is one of the most popular population stratification control methods. It implicitly assumes that the markers are in linkage equilibrium, a condition that is rarely satisfied and that we plan to relax. METHODS We carefully examined the impact of linkage disequilibrium (LD) on PCA, and proposed a simple modification of the standard PCA to automatically adjust for the correlations among markers. RESULTS We demonstrated that LD patterns in genome-wide association datasets can distort the techniques for stratification control, showing 'subpopulations' reflecting localized LD phenomena rather than plausible population structure. We showed that the proposed method effectively removes the artifactual effect of LD patterns, and successfully recovers underlying population structure that is not apparent from standard PCA. CONCLUSION PCA is highly influenced by sets of SNPs with high LD, obscuring the true population substructure. Our shrinkage PCA applies to all available markers, regardless of the LD patterns. The proposed method is easier to implement than most existing LD adjusted PCA methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations

Existing methods to ascertain small sets of markers for the identification of human population structure require prior knowledge of individual ancestry. Based on Principal Components Analysis (PCA), and recent results in theoretical computer science, we present a novel algorithm that, applied on genomewide data, selects small subsets of SNPs (PCA-correlated SNPs) to reproduce the structure foun...

متن کامل

Ancestral Informative Marker Selection and Population Structure Visualization Using Sparse Laplacian Eigenfunctions

Identification of a small panel of population structure informative markers can reduce genotyping cost and is useful in various applications, such as ancestry inference in association mapping, forensics and evolutionary theory in population genetics. Traditional methods to ascertain ancestral informative markers usually require the prior knowledge of individual ancestry and have difficulty for ...

متن کامل

بررسی ساختار جمعیتی گاوهای بومی ایران با استفاده از تحلیل افتراقی مؤلفه‌های اصلی

Effective management of genetic resources in the domestic animals is based on characterization of genetic structure and diversity among populations. Strategies reducing complexity and dimensions of data are required to analyze the genetic relationships between populations based on dense genomic data. The objective of this study was to use the discriminant analysis of principal components (DAPC)...

متن کامل

Tracing Sub-Structure in the European American Population with PCA-Informative Markers

Genetic structure in the European American population reflects waves of migration and recent gene flow among different populations. This complex structure can introduce bias in genetic association studies. Using Principal Components Analysis (PCA), we analyze the structure of two independent European American datasets (1,521 individuals-307,315 autosomal SNPs). Individual variation lies across ...

متن کامل

Analysis of genetic diversity, phylogenetic relationships and population structure of Arasbaran cornelian cherry (Cornus mas L.) genotypes using ISSR molecular markers

Cornelian cherry (Cornus mas L.), considered as the ancestor of cultivated trees in Arasbaran region, is a medicinally and economically plant species. However, little is known about genetic diversity, breeding programs, and population structure of this species in mentioned region. Keeping this in view, the main objectives of present study were to analysis the genetic diversity, phyloge...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Human heredity

دوره 70 1  شماره 

صفحات  -

تاریخ انتشار 2010